Search CORE

108 research outputs found

Evoking agency: Attention model and behavior control in a robotic art installation

Author: Herath D.
Kroos Christian
Stelarc Stelarc
Publication venue: The MIT Press
Publication date: 01/01/2012
Field of study

Robotic embodiments of artificial agents seem to reinstate a body-mind dualism as consequence of their technical implementation, but could this supposition be a misconception? The authors present their artistic, scientific and engineering work on a robotic installation, the Articulated Head, and its perception-action control system, the Thinking Head Attention Model and Behavioral System (THAMBS). The authors propose that agency emerges from the interplay of the robot’s behavior and the environment and that, in the system’s interaction with humans, it is to the same degree attributed to the robot as it is grounded in the robot’s actions: Agency cannot be instilled; it needs to be evoked

University of Canberra Research Repository

Western Sydney ResearchDirect

espace@Curtin

From Robot Arm to Intentional Agent: the Articulated Head

Author: Christian Kroos
Damith C. Herath
Stelarc
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

Robot arms have come a long way from the humble beginnings of the ﬁrst Unimate robot at a General Motors plant installed to unload parts from a die-casting machine to the ﬂexible and versatile tool ubiquitous and indispensable in many ﬁelds of industrial production nowadays. The other chapters of this book attest to the progress in the ﬁeld and the plenitude of applications of robot arms. It is still fair, however, to say that currently industrial robot arms are primarily applied in continuously repeated manufacturing task for which they are pre-programmed. They are known for their precision and reliability but in general use only limited sensory input and the changes in the execution of their task due to varying environmental factors are minimal. If one was to compare a robot arm with an animal, even a very simple one, this property of robot arm applications would immediately stand out as one of the most striking differences. Living organisms must sense changes in the environment that are crucial to their survival and must have some ﬂexibility to adjust their behaviour. In most robot arm contexts, such a comparison is currently at best of academic interest, though it might gain relevance very quickly in the future if robot arms are to be used to assist humans to a larger extend than at present. If robot arms will work in close proximity with and directly supporting humans in accomplishing a task, it becomes inevitable for the control system of the robot to have far reaching situational awareness and the capability to adjust its ‘behaviour’ according to the acquired situational information. In addition, robot perception and action have to conform a large degree to the expectations of the human co-worker

IntechOpen

Western Sydney ResearchDirect

espace@Curtin

A system for video-based analysis of face motion during speech

Author: Kroos Christian
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 16/02/2004
Field of study

During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods

Digitale Hochschulschriften der LMU

Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

Author: Callan Akiko M.
Callan Daniel E.
Jones Jeffery A.
Kroos Christian
Munhall Kevin
Vatikiotis-Bateson Eric
Publication venue: Scholars Commons @ Laurier
Publication date: 01/06/2004
Field of study

Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

CiteSeerX

Crossref

Wilfrid Laurier University

Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks

Author: Kroos Christian
Plumbley Mark
Publication venue: 'International Speech Communication Association'
Publication date: 20/08/2017
Field of study

The first generation of three-dimensional Electromagnetic Articulography devices (Carstens AG500) suffered from occasional critical tracking failures. Although now superseded by new devices, the AG500 is still in use in many speech labs and many valuable data sets exist. In this study we investigate whether deep neural networks (DNNs) can learn the mapping function from raw voltage amplitudes to sensor positions based on a comprehensive movement data set. This is compared to arriving sample by sample at individual position values via direct optimisation as used in previous methods. We found that with appropriate hyperparameter settings a DNN was able to approximate the mapping function with good accuracy, leading to a smaller error than the previous methods, but that the DNN-based approach was not able to solve the tracking problem completely

Crossref

University of Surrey

Surrey Research Insight

Analysis of tongue configuration in multi-speaker, multi-volume MRI data

Author: Geumann Anja
Hoole Philip
Inoue Michiko
Kroos Christian
Leinsinger Gerda
Wismueller Axel
Publication venue
Publication date: 07/12/2016
Field of study

MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach

Publikationsserver des Instituts für Deutsche Sprache

Thinking head: Towards human centred robotics

Author: Cavedon Lawrence
Herath Damith C. (R14489)
Kroos Christian (R11604)
Premaratne Prashan
Stevens Catherine J. (R8645)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Thinking Head project is a multidisciplinary approach to building intelligent agents for human machine interaction. The Thinking Head Framework evolved out of the Thinking Head Project and it facilitates loose coupling between various components and forms the central nerve system in a multimodal perception-action system. The paper presents the overall architecture, components and the attention system. The paper then concludes with a preliminary behavioral experiment that studies the intelligibility of the audiovisual speech output produced by the Embodied Conversational Agent (ECA) that is part of the system. These results provide the baseline for future evaluations of the system as the project progresses through multiple evaluate and refine cycles

Crossref

University of Canberra Research Repository

Western Sydney ResearchDirect

Research Online

Evaluation of the measurement precision in three-dimensional Electromagnetic Articulography (Carstens AG500)

Author: Kroos Christian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Three-dimensional Electromagnetic Articulography (EMA) measures location and orientation of the moving speech articulators in real time by means of small, wired sensors. We evaluated the measurement accuracy of the Carstens AG500 EMA system using data acquired simultaneously with the Vicon optical motion tracking system (OPT). EMA sensors and OPT markers were combined in a single rigid object to be able to predict location and orientation of the EMA sensors from OPT motion tracking data. The error was computed as the root mean squared (RMS) error. We found that deviations from constant inter-sensor distances (relative error) were in general below 1 mm and 0.6° while the difference between the measured and estimated positions (absolute error) ranged between 1 and 2 mm and 0.5° and 0.7°. By examining error patterns, four critical orientation regions were detected, but no discernible location dependent error patterning. Sensor velocity appeared to have little impact. The RMS error of the original position calculation has not been found to be a reliable predictor. In the absence of a clear error structure we recommend careful analysis of unexpected findings in speech production data acquired with EMA. Avenues for further improvement of the system are discussed

Western Sydney ResearchDirect

espace@Curtin

Using sensor orientation information for computational head stabilisation in 3D Electromagnetic Articulography (EMA)

Author: Kroos Christian
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/2009
Field of study

We propose a new simple algorithm to make use of the sensor orientation information in 3D Electromagnetic Articulography (EMA) for computational head stabilisation. The algorithm also provides a well-defined procedure in the case where only two sensors are available for head motion tracking and allows for the combining of position coordinates and orientation angles for head stabilisation with an equal weighting of each kind of information. An evaluation showed that the method using the orientation angles produced the most reliable results

espace@Curtin

Are there compensatory effects in natural speech?

Author: Geumann Anja
Kroos Christian
Tillmann Hans G.
Publication venue: Berkeley : Department of Linguistics, University of California
Publication date: 07/12/2016
Field of study

This work exploited coarticulation and loud speech as natural sources of perturbation in order to determine whether articulatory covariation (motor equivalent behavior) can be observed inspeech that is not artificially perturbed. Articulatory analyses of jaw and tongue movement in the production of alveolar consonants by German speakers were performed. The sibilant /s/ shows virtually no articulatory covariation under the influence of natural perturbations, whereas other alveolar consonants show more obvious compensatory behavior. Our conclusion is that an effect of natural sources of perturbation is noticable, but sounds are affected to different degrees

Publikationsserver des Instituts für Deutsche Sprache